perf(l1): bal optimistic merkleization on validation path#6655
Conversation
|
Lines of code reportTotal lines added: Detailed view |
cea052f to
815f446
Compare
Benchmark Block Execution Results Comparison Against Main
|
01db258 to
43ae7c5
Compare
🤖 Codex Code ReviewFindings
No other consensus/gas-accounting problems stood out in the touched logic. I did not run the test suite; this is a static review of the diff. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have a complete picture of the PR. Let me write the review. PR Review:
|
Greptile SummaryThis PR decouples BAL-merkleization from EVM execution on the
Confidence Score: 4/5Safe to merge for the validation path; correctness is gated by the existing post-execution BAL hash check and the state-root comparison against the block header, so an incorrect optimistic trie results in block rejection rather than silent state corruption. The redesign is mechanically sound: per-field optionals avoid zero-fabrication for unchanged account fields, EIP-161 normalization in Stage C correctly handles account deletion without needing an explicit removed flag, and generate_witness_from_account_updates only reads address plus added_storage.keys() so the stripped optimistic_witness is complete for its purpose. Minor structural issues remain: BalSynthesisItem.address is stored but never read after construction, BalStateWorkItem.removed is an always-false dead branch, and the parallel pre-warm opens one state_trie handle per account rather than per thread. The pre-warm block in handle_merkleization_bal_from_updates (blockchain.rs) and the BalSynthesisItem struct definition (block_access_list.rs) deserve a second look.
|
| Filename | Overview |
|---|---|
| crates/blockchain/blockchain.rs | Core change: pre-scope BAL synthesis, optional channel, handle_merkleization_bal_from_updates with parallel pre-warm; BalStateWorkItem.removed is now always false (dead branch) and timing array expanded from 6 to 7 instants in the pipeline result. |
| crates/common/types/block_access_list.rs | New BalSynthesisItem struct and synthesize_bal_updates function with 13 unit tests; BalSynthesisItem.address field is redundant (duplicates the map key). |
| crates/vm/backends/levm/mod.rs | merkleizer: Sender to Option<Sender>; BAL path skips bal_to_account_updates + send; sequential path uses expect() to enforce the invariant that non-BAL callers always provide a Sender. |
| crates/vm/backends/mod.rs | One-line signature change: merkleizer: Sender to Option<Sender> to match LEVM. |
| crates/common/types/mod.rs | Re-exports BalSynthesisItem and synthesize_bal_updates from block_access_list. |
Sequence Diagram
sequenceDiagram
participant Main as Main Thread
participant Exec as EVM Exec Thread
participant Merkle as Merkleizer Thread
participant Warm as Warmer Thread
Note over Main: exec_merkle_start captured
Main->>Main: synthesize_bal_updates(bal)
Main->>Main: build optimistic_witness
Note over Main: thread::scope entered
par Parallel threads
Main->>Exec: "spawn merkleizer=None on BAL path"
Main->>Merkle: "spawn prepared=optimistic_updates"
Main->>Warm: spawn
end
Note over Merkle: merkle_start_instant captured
Merkle->>Merkle: par_iter pre-warm open_state_trie per account
Merkle->>Merkle: Stage B parallel storage root computation
Merkle->>Merkle: Stage C per-shard state trie updates EIP-161
Merkle->>Merkle: Stage D finalize root state_trie_hash
Note over Merkle: merkle_end_instant captured
Exec->>Exec: execute_block_parallel skip bal_to_account_updates
Exec->>Exec: validate_block_access_list_hash
Note over Exec: exec_end_instant captured
Main->>Main: accumulated_updates optimistic_witness OR streaming_witness
Main->>Main: verify state_trie_hash matches block header
Note over Main: exec_merkle_end_instant captured
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
crates/common/types/block_access_list.rs:1600-1608
The `address` field duplicates the map key in `FxHashMap<Address, BalSynthesisItem>`. In `handle_merkleization_bal_from_updates` the key `addr` is always used instead of `item.address`, and in `optimistic_witness` `*addr` is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.
```suggestion
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
```
### Issue 2 of 3
crates/blockchain/blockchain.rs:319-326
On the BAL synthesis path `removed` is hardcoded to `false` and `handle_merkleization_bal_from_updates` never sets it. The `if item.removed { account_state = AccountState::default(); }` branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.
```suggestion
struct BalStateWorkItem {
hashed_address: H256,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
```
### Issue 3 of 3
crates/blockchain/blockchain.rs:921-932
**Pre-warm opens one `state_trie` handle per account**
The rayon closure calls `open_state_trie(parent_state_root)` for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If `open_state_trie` acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that `open_state_trie` is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.
Reviews (1): Last reviewed commit: "perf(l1): BAL optimistic merkleization o..." | Re-trigger Greptile
| #[derive(Debug, Clone, Default)] | ||
| pub struct BalSynthesisItem { | ||
| pub address: Address, | ||
| pub balance: Option<U256>, | ||
| pub nonce: Option<u64>, | ||
| pub code_hash: Option<H256>, | ||
| pub code: Option<Code>, | ||
| pub added_storage: FxHashMap<H256, U256>, | ||
| } |
There was a problem hiding this comment.
The
address field duplicates the map key in FxHashMap<Address, BalSynthesisItem>. In handle_merkleization_bal_from_updates the key addr is always used instead of item.address, and in optimistic_witness *addr is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.
| #[derive(Debug, Clone, Default)] | |
| pub struct BalSynthesisItem { | |
| pub address: Address, | |
| pub balance: Option<U256>, | |
| pub nonce: Option<u64>, | |
| pub code_hash: Option<H256>, | |
| pub code: Option<Code>, | |
| pub added_storage: FxHashMap<H256, U256>, | |
| } | |
| #[derive(Debug, Clone, Default)] | |
| pub struct BalSynthesisItem { | |
| pub balance: Option<U256>, | |
| pub nonce: Option<u64>, | |
| pub code_hash: Option<H256>, | |
| pub code: Option<Code>, | |
| pub added_storage: FxHashMap<H256, U256>, | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/types/block_access_list.rs
Line: 1600-1608
Comment:
The `address` field duplicates the map key in `FxHashMap<Address, BalSynthesisItem>`. In `handle_merkleization_bal_from_updates` the key `addr` is always used instead of `item.address`, and in `optimistic_witness` `*addr` is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.
```suggestion
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
```
How can I resolve this? If you propose a fix, please make it concise.| struct BalStateWorkItem { | ||
| hashed_address: H256, | ||
| info: Option<AccountInfo>, | ||
| nonce: Option<u64>, | ||
| balance: Option<U256>, | ||
| code_hash: Option<H256>, | ||
| removed: bool, | ||
| /// Pre-computed storage root from Stage B, or None to keep existing. | ||
| storage_root: Option<H256>, |
There was a problem hiding this comment.
On the BAL synthesis path
removed is hardcoded to false and handle_merkleization_bal_from_updates never sets it. The if item.removed { account_state = AccountState::default(); } branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.
| struct BalStateWorkItem { | |
| hashed_address: H256, | |
| info: Option<AccountInfo>, | |
| nonce: Option<u64>, | |
| balance: Option<U256>, | |
| code_hash: Option<H256>, | |
| removed: bool, | |
| /// Pre-computed storage root from Stage B, or None to keep existing. | |
| storage_root: Option<H256>, | |
| struct BalStateWorkItem { | |
| hashed_address: H256, | |
| nonce: Option<u64>, | |
| balance: Option<U256>, | |
| code_hash: Option<H256>, | |
| /// Pre-computed storage root from Stage B, or None to keep existing. | |
| storage_root: Option<H256>, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 319-326
Comment:
On the BAL synthesis path `removed` is hardcoded to `false` and `handle_merkleization_bal_from_updates` never sets it. The `if item.removed { account_state = AccountState::default(); }` branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.
```suggestion
struct BalStateWorkItem {
hashed_address: H256,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
```
How can I resolve this? If you propose a fix, please make it concise.| // Warm parent state-trie pages for all touched accounts in parallel before | ||
| // Stage B / Stage C race for them. This replaces the prefetch that the old | ||
| // streaming path got for free via `bal_to_account_updates`. | ||
| accounts | ||
| .par_iter() | ||
| .try_for_each(|(hashed_address, _)| -> Result<(), StoreError> { | ||
| let state_trie = self.storage.open_state_trie(parent_state_root)?; | ||
| let _ = state_trie.get(hashed_address.as_bytes())?; | ||
| Ok(()) | ||
| })?; | ||
|
|
||
| // === Stage B: Parallel per-account storage root computation === |
There was a problem hiding this comment.
Pre-warm opens one
state_trie handle per account
The rayon closure calls open_state_trie(parent_state_root) for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If open_state_trie acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that open_state_trie is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 921-932
Comment:
**Pre-warm opens one `state_trie` handle per account**
The rayon closure calls `open_state_trie(parent_state_root)` for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If `open_state_trie` acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that `open_state_trie` is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.
How can I resolve this? If you propose a fix, please make it concise.Decouple merkleization from EVM execution when the validation path receives a BAL: synthesize per-field deltas from the input BlockAccessList pre-execution and run merkle stages B/C/D in parallel with execution + warming. validate_block_access_list_hash remains the post-execution correctness gate. Closes #6584.
10b1aa7 to
61a510f
Compare
ElFantasma
left a comment
There was a problem hiding this comment.
Three inline findings + two body observations, all minor. Not blocking.
-
validate_block_access_list_hashis the only correctness net for the optimistic merkle output, and it's gated onproduced_bal.is_some()atblockchain.rs:569-576. On the BAL path this should always fire (Amsterdam+ produces a BAL), but the correctness story is now load-bearing on a runtime invariant rather than a static one. Worth adebug_assert!(produced_bal.is_some(), "BAL validation path must produce a BAL")on the optimistic branch to surface a future regression early. -
execute_block_pipeline'sif let Some(bal) = header_baldoes NOT checkis_amsterdam(levm/mod.rs:391, pre-existing). Thedebug_assert!(is_amsterdam, …)insideexecute_block_parallelis a no-op in release. This PR adds another caller (the new optimistic merkle path) that depends on parallel execution running only on Amsterdam+. Ifadd_block_pipelineis ever called withSome(bal)on a pre-Amsterdam block in release, the parallel path runs incorrectly AND the synthesized merkle is consumed unchecked. Recommend gating the outerif let Some(bal)onis_amsterdam && header_bal.is_some(), or upgrading the inner debug_assert to a realEvmError. Practical risk is low (header BAL is feature-gated input to engine_newPayload), but the cost-to-fix is tiny.
| // Stage B / Stage C race for them. This replaces the prefetch that the old | ||
| // streaming path got for free via `bal_to_account_updates`. | ||
| accounts | ||
| .par_iter() |
There was a problem hiding this comment.
Pre-warm opens one state trie per account. Stage B and Stage C below use the canonical one-trie-per-worker pattern (line ~997 and ~1093: let state_trie = self.storage.open_state_trie(parent_state_root)?; inside each worker's body, then loop over the bin/shard). This pre-warm pass does the opposite — accounts.par_iter() opens a fresh state trie for every account, then issues one .get().
If open_state_trie is cheap (just a cursor against a shared RocksDB handle), the cost is dominated by the .get() and this is fine. If it's not (e.g. it builds a per-call cache, allocates a node arena, etc.), the warming pass becomes N opens vs Stage B's max(num_bins, 16) opens — strictly more work to warm than to execute. Either way, mirroring Stage B's pattern (rayon chunk the addresses, one trie open per chunk, batch the gets) keeps the three stages consistent and removes the question.
Non-blocking — measurable either way, but worth at least a comment noting the deliberate choice.
|
|
||
| let mut added_storage: FxHashMap<H256, U256> = FxHashMap::default(); | ||
| for sc in &account.storage_changes { | ||
| debug_assert!( |
There was a problem hiding this comment.
nit: debug_assert!(!sc.slot_changes.is_empty(), …) followed by if sc.slot_changes.is_empty() { continue; } two lines down express the same intent — one panics in debug, one defensively skips in release. If the invariant is real (canonical BAL ordering forbids empty slot_changes), the continue is unreachable and the .expect() at line 1649 should be the only fail-safe. If the invariant is defensible (a malformed BAL on the wire can produce empty slot_changes), the debug_assert is misplaced — that's a wire-format violation that should return a real MempoolError/decode error upstream, not panic in dev and silently skip in prod.
Pick one. The test at line 1936 (synthesize_skips_when_slot_changes_empty) is currently #[cfg_attr(debug_assertions, should_panic(…))] which encodes the inconsistency.
| // Sequential path (existing code, for block production and non-Amsterdam). | ||
| // The non-BAL caller always provides a Sender; the BAL path returned above. | ||
| let merkleizer = merkleizer | ||
| .expect("sequential execution path requires a merkleizer Sender (non-BAL caller)"); |
There was a problem hiding this comment.
.expect("sequential execution path requires a merkleizer Sender (non-BAL caller)") is a structural panic that depends on the early-return for the BAL branch a few hundred lines up staying exactly where it is. If a future refactor reshapes the BAL branch (e.g. extracts it into a helper, moves the early return, or introduces a third path), the contract here can silently break and the next non-BAL block on mainnet panics the EVM thread.
Cheap belt-and-suspenders: let Some(merkleizer) = merkleizer else { return Err(EvmError::Custom("non-BAL execution path called without merkleizer Sender".into())); }; — same control flow, returns instead of panics, surfaces the broken contract through the normal error path instead of bringing down the executor thread.
Summary
Decouple BAL-merkleization from EVM execution on the
engine_newPayloadvalidation path. Synthesize per-field state deltas from the inputBlockAccessListbeforethread::scopeand run merkle stages B/C/D in parallel with execution + warming.validate_block_access_list_hashpost-execution remains the correctness gate; on mismatch the optimistic merkle output is discarded by the existing?error flow.Closes #6584. Validation path only (builder still streams). Amsterdam+ only.
Design
BalSynthesisItem { address, balance, nonce, code_hash, code, added_storage }incrates/common/types/block_access_list.rscarries per-field optionals — noOption<AccountInfo>blob, so a balance-only ETH-transfer recipient doesn't fabricate zero-nonce /EMPTY_KECCACK_HASHdeltas that would corrupt the trie.BalStateWorkItemrefactored to per-field optionals so the streaming flow and the synthesis flow lower into the same shape.handle_merkleization_bal→handle_merkleization_bal_from_updates(prepared: FxHashMap<Address, BalSynthesisItem>, parent_header). Stage A (channel drain) deleted from the BAL path.execute_block_pipelinesynthesizesoptimistic_updates+optimistic_witnessbeforethread::scope. No channel + no drain thread on the BAL path.LEVM::execute_block_pipeline/execute_block_parallelandEvm::execute_block_pipelinenow takeOption<Sender>. The validation-with-BAL caller passesNoneand the EVM skips its ownbal_to_account_updates+merkleizer.send(levm/mod.rs:1034-1038) — that was redundant work (and did pre-state lookups we no longer need).generate_witness_from_account_updatesonly needsaddress+added_storage.keys(); the bulk of the witness comes fromDatabaseLogger.state_accessed.merkle_start_instantcaptured so the pipeline metric line showsstart_delay(how much of execution the merkleizer overlapped with).Perf refinements (inside
handle_merkleization_bal_from_updates)accounts.par_iter()before Stage B — warms RocksDB OS pages + trie node arena for the addresses Stage B/C are about to touch. Replaces what the old streamingbal_to_account_updateswas doing viastore.prefetch_accounts.item.added_storagebefore the per-slot insert loop so trie inserts walk node-arena paths in order instead of bucket order.Correctness
removed/removed_storageintentionally not inferred from BAL. Stage B'svalue.is_zero()+ Stage C's EIP-161 normalization collapse accounts identically to the streaming path. Verified for plain selfdestruct (EIP-6780 only emitsrecord_balance_change(to, 0)on same-tx-created accounts; pre-existing contracts degrade to a balance transfer).storage_readsare omitted from the synthesized map (no state delta to apply; witness path captures their reads fromDatabaseLogger.state_accessed).Tests
synthesize_bal_updatesinblock_access_list.rs: read-only skip, pure storage write, balance-only / nonce-only / code-only (regression cases for the partial-info bug avoided by the per-field design), last-wins on each delta vec, zero storage retained, defensive emptyslot_changes, account creation, EIP-6780 selfdestruct collapse.tooling/ef_tests/blockchain/test_runner.rs:192exercises this code path on every Amsterdam fixture (pass 2 callsadd_block_pipeline(block, Some(bal))).Follow-ups (not in this PR)
synthesize_bal_updatesitself viabal.accounts().par_iter(). Thekeccak(new_code)of each code-changed account is the serial-on-main-thread hotspot.Test plan
cargo test -p ethrex-common synthesize_tests(13/13)cargo fmt --checkcargo clippy -p ethrex-blockchain --no-deps -- -D warnings(clean; pre-existingMAX_BLOB_TX_SIZEwarning is feature-gated byc-kzgon main)balgrouptooling/ef_tests/blockchain) — exercises the new path via the two-pass parallel runner on every Amsterdam fixture